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Abstract 

In problems of estimation and control which involve a network, efficient distributed computation of 
averages is a key issue. This paper presents theoretical and simulation results about the accumulation 
of errors during the computation of averages by means of iterative "broadcast gossip" algorithms. 
Using martingale theory, we prove that the expectation of the accumulated error can be bounded 
from above by a quantity which only depends on the mixing parameter of the algorithm and on 
few properties of the network: its size, its maximum degree and its spectral gap. Both analytical 
results and computer simulations show that in several network topologies of applicative interest the 
accumulated error goes to zero as the size of the network grows large. 

1 Introduction 

Distributed computation of averages is an important building block to solve problems of estimation 
and control over networks. As a reliable time- independent communication topology may be unlikely 
in the applications, a growing interest has been devoted to randomized "gossip" algorithms to compute 
averages. In such algorithms, at each time step, a random subset of the nodes communicates and performs 
computations. Unfortunately, some of these iterative algorithms do not deterministically ensure that the 
average is preserved through iterations, and due to the accumulation of errors, in general there is no 
guarantee that the typical algorithm realization will converge to a value which is close to the desired 
average. 

In the present paper we focus on one of these randomized algorithm, the Broadcast Gossip Algorithm 
(BGA). In this algorithm, a node is randomly selected at each time step to broadcast its current value 
to its neighbors, which in turn update their values by a local averaging rule. Since these updates are not 
symmetric, it is clear that the average is not preserved, but instead is changed at each time step by some 
amount. In this paper we study how these errors accumulate, and how much the convergence value of 
the algorithm differs from the original average to be computed. 

1.1 Contribution 

In this paper, we study the bias, or asymptotical error, committed by a Broadcasting Gossip averaging 
algorithm, and we show that large neighborhoods and a large mixing parameter induce a large asymp- 
totical error. As a theoretical contribution, we study the average of states as a martingale, and by this 
interpretation we prove that on symmetric graphs the expectation of the accumulated error can not be 

q d 2 

larger than a constant times 21iH£ , where q is the "mixing" parameter of the algorithm, N is the 

1 — q NXi 

network size, d ma x is the maximum degree of the nodes, and Ai is the network spectral gap. For some 
families of graphs (e.g, expander graphs), this is enough to prove that the bias goes to zero as iV goes 
to infinity. Further, by means of simulations we show that, on some example graph topologies, the mean 
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bias is an increasing function of the mixing parameter and is proportional, on large networks, to the ratio 
between degree and size of the network. In particular, whenever c? m ax = o(N), the simulated bias goes 
to zero as N goes to infinity. 

1.2 Related works 

The paper [9] provides a general theory for randomized linear averaging algorithms, and presents a few 
example algorithms, some of which do not preserve the average of the states. Among these algorithms, 
the Broadcast Gossip Algorithm, studied in the present paper, has been attracting a wide interest, for its 
natural application to wireless networks: main references include the paper [2] and the recent survey [6]. 
While it is simple to give conditions to ensure that the expectation of the convergence value is equal to the 
initial average, the problem of estimating the difference between expectation and realizations is harder, 
and has received partial answers in a few papers. In [1] the authors study a related communication model, 
in which the broadcasted values may not be received with a probability which depends on the transmitter 
and receiver nodes, and claim that "aggressive updating combined with large neighborhoods [. . . ] result 
in more variance [of the convergence value] within the short time to convergence" . This intuition extends 
to the Broadcast Gossip Algorithm which we are considering in this paper. Actually, in [2] the variance 
of the limit value has been estimated for general graphs, with an upper bound which is proportional 
to ^1 — - 1 _ t ]_ Xn - j , where A* is the z-th smallest eigenvalue of the graph Laplacian. This bound, 

however, is not useful to prove that the bias goes to zero as N grows, a fact which has been proved in [7] 
for sequences of Abelian Cayley graphs with bounded degree, using tools from algebra and Markov chain 
theory. Analogous problems can be studied for other randomized algorithms which do not preserve the 
average. For instance, in [8] two related algorithms are studied, in which node values are sent to one 
random neighbor only. If at each time step one random node sends its value, then the variance has an 
upper bound which is proportional to jz^jf, while if at each time step every node sends its value, then 

the bound is proportional to . 

2 Problem statement 

Let a graph Q = (V, £) with £ C V x V be given, together with N = \V\ real numbers {x v } v& v C [0,L]. 
For every node v G V, we denote its out- neighborhood by Af+ — {u 6 V : (u,v) 6 £}, and its in- 
neighborhood by J\f~ = {u 6 V : (v,u) € £}■ The following Broadcasting Gossip Algorithm (BGA) is 
run in order to estimate the average x avc = N~ 1 J2 V £v x v 

At each time step t £ Z>o, one node v is sampled from a uniform distribution over V. Then, node v 
broadcasts its state x v (t) to its neighbors u £ A/"+, which in turn update their states as 

x u (t+l) = (l-q)x u (t)+qx v (t). (1) 

The parameter q € (0, 1) is said to be the mixing parameter of the algorithm. If instead u $ A/J", there 
is no update: x u (t + 1) = x u (t). 

It is known from [9, Corollary 3.2] that the BGA converges, in the sense that there exists a random 
variable x* such that almost surely lim t ^ +00 x(t) = x*l. Let now x avo (t) = A -1 J2 v ev Although 
one can find conditions to ensure that E[x*] = x ave (0), in general x* is not equal to x avc (0). Then, 
it is worth to ask how far the convergence value is from the initial average. To study this bias in the 
computation of the average, we define 

P(t) = \x avc (t)~x avc (0)\ 2 . 

The goal of this work is to study this quantity, and in particular its limit E[/3(oo)] := limt_ i . 00 E[/3(f)], 
with a special attention to its dependence on the size of the network. In particular, we shall say that the 
algorithm is asymptotically unbiased if lim E[/3(oo)] = 0. 

N— > + oo 
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3 Analysis 



3.1 A simplistic bound 

Using (1), it is immediate to compute that, given v to be the broadcasting node at time t, 

x avc (t + 1) - x ave (t) = — (x v (t) - x u (t)). (2) 

Thcn, we can obtain the following deterministic bound on the error introduced at each time step, 

\x mc (t + 1) - awe(i)l < jj<$L < ^f±L, (3) 

where <i+ = |A/"+ is the out-degree of node v, and ci^ax = max^gy d-t- This simple bound is worth some 
informal remarks. Indeed, Equation (3) suggests that choosing a low value of the mixing parameter q, 
and a graph with low degree (i+ ax and large size N, may ensure a small bias in the computation of 
the average. However, by choosing q, N or <i+ ax , one affects the speed of convergence of the algorithm. 
Assume one is interested in an accurate computation, and chooses low values for q and <i^ ax , compared 
to N. This choice would likely imply a slow convergence, and in turn a slow convergence may enforce to 
run the algorithms for a larger number of steps, in order to meet the same precision requirement. These 
extra steps, however, would introduce extra errors, thus possibly wasting the desired advantage in the 
accuracy. We argue from this discussion that there is a delicate trade-off between speed and accuracy for 
the BGA algorithm. The results presented in the next sections will shed light on this trade-off. 

3.2 The average as a martingale 

In this section, we shall derive a general bound on E[/3(oo)] in terms of the topology of the graph. The 
derivation is based on applying the theory of martingales to the stochastic processes x(t) and x ave (i)- 
The reader can find the essentials of martingale theory in [11] or in [13]. 

Definition 3.1 Given a sequence (filtration) of a -algebras {c n } rie z >0 , a sequence of random variables 
{A/„}„ 6 z >0 is a o~„-martingale ifE[M m \a n ] = M n , for any m> n. 

Our first result states that x ave (t) is a martingale with respect to the filtration induced by x(t). Before 
the statement, we need some definitions. Let d+ = |A/"+| and d~ — \Af~\ be the out-degree and in-degree 
of node v. The graph Q is said to be balanced if d~ = d+ for all u G V. Given a set of random variables 
X, we denote by cr(X) the sigma-algebra generated by the random variables in X. 

Proposition 3.1 Let us consider the BGA algorithm and the filtration J- t = a({x(s) : s < t}). If 
the graph Q is balanced, then the sequence of random variables {.T ave (i)} te z >0 is a square-integrable Tt- 
martingale. 
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Proof: First, note that x ave (t) is J-^-measurable. Moreover, Equation (2) implies that for all t > 0, 

E[ Xave (t + 1) - x ave (t)\F t ] = E [jf E (*«(*) - *«(*)) 



g 

N 2 

g 

TV 2 

g 

iV 2 



.v£V 



uEV 



J2(dt -d-)x v (t) 



vev 



0, 



since we are assuming d~ = c?J for every u G V. Then, the sequence of random variables (a; ave (t)) t6 z >0 i s 
an J^-martingale. Moreover, the fact that x ave {t) € [0, L] implies that the martingale is bounded in LP 
for every p > 1, and in particular square- integrable. □ 

Let us define the distance from the agreement as 
Using this definition, we can prove an inequality which is a refinement of (3). Let d^ax = max^-fd^} and 

dmax max {^maxi ^max}- 

Lemma 3.2 Let Q be balanced. Then, the increments of the martingale {x aV c(£)}tez >0 have bounded 
variance, in particular, for all t > 0, 



E[(x avc (t + 1) - x avc (i)) 2 | Ft] < 4^p<Z(i). 



(4) 



Proof: By (2), we have 



E[(x avc (t + 1) - z avc (t)) 2 | T t ] = 1 J2 ( J? Ew<)- 



<2 

a 2 d 2 

< 4 g_^nax d(t) _ 

This completes the proof. 

We define the rate of convergence of the algorithm as 

B := suplimsupE^)] 17 '. 

ir(0) t->+oo 



^12jp d i E - x u (t)f 

(^2 ( d iax) 2rf W + rf £ax d max d 0) 



«G7V+ 



□ 
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Then, there exists a positive constant Cr, depending on x(0), such that E[d(i)] < Cb.Pl. This fact, 
combined with Lemma 3.2, implies that 



E[(x avo (i + 1) - x avc (t)) 2 } =E [K[(x m (t + 1) - x avc (t)) 2 \T t ] 

2 j2 

^ a s~i 9 max r>£ 



(5) 



We recall that the spectral gap of the graph £/ is the smallest (in modulus) non-zero eigenvalue of its 
Laplacian matrix, and we denote this quantity by Ai . It is well-known that Ai relates to the mixing rate 
of Markov chains, and to the speed of convergence of gossip algorithms [3, 9]: the larger the spectral gap, 
the faster the convergence. In particular, let us assume that the graph Q be symmetric, that is, such that 
7V+ = Af~ for all v e V. Then, we know from [7, Equation (18)] that 



2g(l-g) 
R<1 —Ax. 



(6) 



Using these facts, we are going to prove the next result about the asymptotic behavior of the bias as 
t — > +oo. 

Proposition 3.3 If Q is symmetric, then there exists a constant C > such that 

e[/3(oo)]<c q 5r - 

1 — q NXi 

Proof: Using the orthogonality of the increments of squarc-integrablc martingales, we observe that 

21 



lim E[(x ave (t+1) -a; aV e(0)) 2 ] = lira E 

t— >+oo T— > + oo 



: lim E 

T-i-+oo 



T-l \ 

(x avc {t + 1) - x ave (t)) 

t=0 / 
'T-l 

^(x avo (t + l) 

t=0 

-1 

+2 ^ y^(a=aveft + 1) ~ X avc (t))(x avc (s + 1) - X ave (s)) 
i=l s<t 
T-l 

hm^ [ ]T E [(x m (t + 1) - x avo (t)) 2 ] 
30 t=o 

T-l 

+ 2]T]TE[(x avc (< + l) (t))(x avc (s + 1) 

(*)) 

t=l s<t 
T-l 

= lim V E \(x ave (t + 1) - x avc {t)) 2 ] . 



T^ + c 



_t=0 
T-l 



By applying Equation (5) 



T-l 



lim E[(x avc (i + 1) - x avc (0)) 2 } <4C R 

t— >+oo JV 



lim V i?* = 4 1 

T->+oo ^-^ 



Q 2 d 2 



2 7' 

2l ^ y~r 9 "ma ,1 1 



1 



t=0 



iV 2 



o d 2 

The inequality in (6) implies that lim E\(x avc (t + 1) - x avc (0)) 2 ] < 2C R — 2^. 

t^+oo 1 — q NXi 

follows, with C = 2Cr, by applying the dominated convergence theorem. 



The thesis then 
□ 
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Note that the proof of Proposition 3.3 also implies that 



su P E[(x avo (t + 1) - x avc (o)) 2 ] < c q 5r - 

t i — q -/v Ai 

On the other hand, for a convergent square-integrable martingale M t , we know by Doob's maximal 
inequality that E[sup t M t 2 ] < 4E[lim t Mf]. Then, we can immediately obtain the following finite-time 
counterpart of Proposition 3.3. 

Theorem 3.4 If Q is symmetric, then there exists C' > such that 



E 



sup[3(t) 
ten 



a d 2 

< (J 1 "max 

~ 1-q N\ t ' 



Let us now consider a sequence of graphs Qn of increasing size N. In such a sequence, both d max and 
Ai are functions of N. In this context, Proposition 3.3 implies the following corollary. 

Corollary 3.5 Let I C N and (Gn)nei oe a sequence of symmetric graphs such that Qm = (Vn,£n) and 
\V N \ = N. If 

= o(N) asN -> +oo, 

Ai 

</ien i/ie _BGvl algorithm is asymptotically unbiased. 

Note that, since Xa, vo (t) converges a.s. and linif^+oo x aV e(i) S [0, L], it is trivial to find an upper bound 
on E[/3(oo)] which does not depend on N. The interest of the above corollary is in giving a sufficient 
condition for E[/?(oo)] to be o(l) as N — > oo. 

Remark 3.6 Applying Markov's inequality, we see from Proposition 3.3 that for any c > 0, 

a d 2 1 

y "max 1 



P[/3(oo) > c] < C 



1 - q TVAi c 



In the applications, one is often interested in computing an average because the average is the maximum 
likelihood estimator of the expectation of a random variable. In such context, the average enjoys the 
property that the mean square error, committed by approximating the expectation by the average of 
N samples, is equal to 1/N. For this reason, one would like to ensure that the bias introduced by the 
Broadcast Gossip algorithm is not larger than 1/N. If we take c = 1/N, we get 



floo) > ^ 



a d 2 

^ q 2 max 

1-9 Ai ' 



Then, provided the right-hand-side of this inequality does not diverge, we can choose the mixing parameter 
q so that with a positive given probability the bias is below 1/N. In such case, if our purpose is distributed 
estimation of an expected value, averages which are approximated by a Broadcast Gossip Algorithm are 
as good as averages computed by a centralized method. □ 



4 Examples 

In this section, we show that the BGA is asymptotically unbiased on several example topologies which 
have been considered in the literature. Given a graph Gn 01 size N, we denote by Xi(N) and d max (N) its 
spectral gap and maximum degree, respectively. We consider the following example sequences of graphs. 
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• Expander graphs. 

A sequence of graphs is said to be an expander sequence if there exist d £ N and c > such that 

d 2 

for every N . Xi(N) > c and d max (N) < d. In this case, -f^- is bounded, and this fact implies by 
Proposition 3.3 that 

E[/3(oo)] = O C^j as N -» +00, 

and then the BGA is asymptotically unbiased and Remark 3.6 applies. An example of an expander 
sequence is given by a sequence of de Bruijn graphs on n symbols of increasing dimensions k. A de 
Bruijn graph on n symbols of dimension k has n k vertices and edges from any i to ni, ni + 1, ni + 
2, . . . , ni + k — 1 (all modulo n k ). Their expander properties have already been applied to efficient 
averaging algorithms in [5] . 

• Hypercube graphs. 

The n-dimensional hypercube graph is the graph obtained drawing the edges of a n-dimcnsional 
hypercube. It has N = 2" nodes which can be identified with the binary words of length n, and two 
nodes are neighbors if the corresponding binary words differ in only one component. For these graphs 
it is known, for instance from [10, Example 7], that Ai(A) = 0(l/log A) and d max (A) = 0(log A). 
Then, 

/log 3 N\ 

E[/3(oo)] = O - as A — > +00 



N 



and the BGA is asymptotically unbiased. 



• /c-dimensional square lattices. 

We consider square lattices obtained by tiling a fc-dimcnsional torus, with N = n k nodes. For these 
graphs we know [4, Theorem 6], that Ai(AT) = <d{l/N 2 ' k ) and d max {N) = 2k. Then, 

E[/3(oo)] = O (j^Jk) as A ^ +00, 

and we argue that A:— lattices are asymptotically unbiased if k > 3. However, we know that the BGA 
is asymptotically unbiased also if k = 1, 2: this has been proved in [7] using different techniques. 

• Random geometric graphs. 

We can also consider random sequences of geometric graphs constructed as follows. We sample N 
points from a uniform distribution over the unit square, and we let nodes i and j be connected if 

the two corresponding points in the square are less than r(N) far apart, with r(N) = 1.1 y^p^. 
For these graphs, we know from [12] that with high probability X%(N) = @{1/N) and d max (N) = 
0(log A). Then, with high probability E[/3(oo)] = O (log 2 A) as A — > +00, and we can not conclude 
asymptotical unbiasedness. 

• Complete graphs. 

For these graphs, A X (A) = A and d max (N) = N - 1. 

Then, we can not conclude from Proposition 3.3 that the BGA is asymptotically unbiased on 
complete graphs. Actually, in [9] it is shown that the BGA is not asymptotically unbiased on 
complete graphs, and in particular 

E[^(oo)]=Var( a; (0)) 2 5 g iV ~ 1 , (7) 

where by Var(x(0)) we denote the (sample) variance of the initial condition. 
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5 Simulations 



We have extensively simulated the evolution of BGA algorithm on sequences of graphs, and in particular 
on the example topologies presented in Section 4. In this section, we account for our results about the 
dependence of the bias /3(oo) on the size N and on the parameter q. 

Our simulation setup is as follows. Let q, N and the graph topology be chosen. For every run of 
the algorithm we generate a vector of initial conditions x(0), sampling from a uniform distribution over 
[0,1]. Then, we run the algorithm until the disagreement d(t) is below a small threshold e, which we 
set at 1CP 4 . At this time T e = inf{i > : d(t) < e}, the algorithm is stopped, and f3(T £ ) is evaluated. 
In order to simulate the expectation of E[/3(oo)], we average the outcome of 1000 realizations of (3(T e ). 
Our results about the dependence on N are summarized in Figure 1, which plots the average bias against 
N in a log-log diagram. As expected, complete graphs are not asymptotically unbiased, while all other 
topologies, in which the degree is o(N), are asymptotically unbiased. In particular, for de Bruijn graphs 
on 2 symbols, ring graphs and torus graphs, the bias is 9(A _1 ), whereas for hypercubes and random 
geometric graphs the bias is 8( lo ^y ). Overall, our set of simulations suggests that 

E[/?(oo)] = 6 asTV^oo. 

Results presented in Figure 2 confirm that this asymptotical law is independent of the choice of q, provided 
q < 1. Note indeed that if we run the BGA with q = 1 in the update (1), the convergence value is always 
one of the initial values, sampled according to a uniform distribution. This implies that, if q = 1, then 
E[/3(oo)] = Var(x ll (0)) = 1/12. If instead q £ (0, 1), simulations in Figure 3 show that the bias E[/3(oo)] 
is an increasing function of the mixing parameter. 

Average Bias 

10" F 




Figure 1: Average bias /3(oo) as a function of the graph size N, for q = 0.5 and on different topologies 
(various marks). Solid lines represent 0(iV -1 / 2 ), Q (log N/N), (log iV/iV) and 0(iV _1 ), respectively. 



x If the topology is random, namely a random geometric topology as described above, it is also sampled at this stage. 
Disconnected realizations are discarded: however, disconnected realizations are very few in our random geometric setting 
and their number decreases as N grows, so that they are less than 2% when N > 50. 
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Figure 2: Average bias /3(oo) as a function of the size N, on a sequence of de Bruijn graphs on 2 symbols, 
for different values of q. The solid horizontal line represents the theoretical value /3(oo) = 1/12 obtained 
for q = 1. 



6 Conclusion 

In this paper, we have proven that on any symmetric graph, E[/3(oo)] = O ^ -jy^ J , and in particular the 
BGA is asymptotically unbiased on expander graphs. On the other hand, simulations suggest that, on 
sequences of (almost) regular graphs with degree d(N), the bias is such that E[/3(oo)] = O^p- ^j ■ Our 

future research will be devoted to find a bound on E[/3(oo)] which ensure asymptotic unbiasedness on a 
wider set of topologies, and more in general to study the trade-offs between speed and accuracy in gossip 
algorithms. 



Acknowledgements 

P. Frasca wishes to thank M. Boccuzzi for his advice in implementing the simulations and S. Zampieri 
for finding an error in an earlier draft. 



References 

[1] T. C. Aysal, A. D. Sarwate, and A. G. Dimakis. Reaching consensus in wireless networks with 
probabilistic broadcast. In Allerton Conf. on Communications, Control and Computing, Allerton, 
IL, USA, September 2009. 

[2] T. C. Aysal, M. E. Yildiz, A. D. Sarwate, and A. Scaglione. Broadcast gossip algorithms for con- 
sensus. IEEE Transactions on Signal Processing, 57(7):2748-2761, 2009. 

[3] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algorithms. IEEE Transactions 
on Information Theory, 52(6):2508-2530, 2006. 



9 



0.08 



0.07 



0.06 



0.05 



0.04- 



0.03- 



0.02- 



0.01 - 



Average Bias 





Torus 


★ 


Ring 


♦ 


Hypercube 


□ 


De Bruijn 


■ 


Complete 





Random Geometric 




Figure 3: Average bias /3(oo) as a function of the mixing parameter q, for TV = 64 and on different 
topologies (various marks). The solid line is (7), confirming the theory in the complete case. 
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